feat(cloud-agent): add Kilo SDK session facade#3671
Conversation
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Executive SummaryIncremental review of commit Resolved Issues (cumulative)
Incremental Changes Reviewed (this pass)
Files Reviewed (cumulative)
Reviewed by claude-4.6-sonnet-20260217 · 637,738 tokens Review guidance: REVIEW.md from base branch |
2c0086b to
4618907
Compare
jeanduplessis
left a comment
There was a problem hiding this comment.
Assuming stale-producer fencing is addressed, overall design looks like a good starting point. Stable /kilo facade keeps sandbox and wrapper topology out of public SDK contract, while existing session DO remains lifecycle authority. That feels like right boundary for shipping quickly without locking us into current runtime implementation.
A few points may be worth considering before external consumers start depending on behavior:
- Event delivery semantics: current SSE feed appears best-effort, without replay after disconnect. That seems reasonable for UI updates if clients reconcile through session.get() and session.messages() after reconnect. If reliable automation feeds are expected, replay cursors and persisted events may be worth thinking through sooner since that would be more structural.
- Projection contract: outer routing directory is virtualized, while nested native payload fields may still include sandbox-local values. Preserving those fields is a reasonable v1 choice, but it may help to describe nested payload content as opaque so clients do not build dependencies on runtime topology.
- Cold-history contract: keeping cursors opaque and omission states explicit should preserve flexibility. It may be useful to avoid documenting storage ordering or cursor encoding as public behavior.
Some technical debt seems acceptable for first version, with a few areas worth watching as usage grows:
- UserKiloFacade is taking on several coordination responsibilities. That is fine initially, though stable policies such as event projection and producer fencing may become good extraction points later.
- User-scoped facade DO may become hot for users with many active sessions or subscribers. Metrics around subscriber count, event throughput, and slow-consumer eviction should make it clear whether internal sharding is ever needed.
- Warm and cold reads have separate implementations. Shared contracts and current tests are a solid base; a small conformance suite could help prevent behavior drift as SDK surface expands.
None of these look like reasons to hold the PR for a broader redesign. Main thing is keeping v1 transport details from becoming accidental public guarantees, so implementation remains easy to evolve after shipping.
Summary
Why
Cloud Agent sessions need a stable, authenticated SDK surface so external clients can attach without depending on sandbox topology or exposing internal wrapper details. The facade gives
@kilocode/sdk/v2consumers a narrow public contract for owned root sessions while preserving durable admission, runtime fencing, and cold-history access when a wrapper is unavailable.What was done
/kilofacade backed by a per-userUserKiloFacadeDurable Object. It supports owned root-session listing, detail and message reads, async text prompts with optional Kilo agent/model selection, abort, global SSE, and session-scoped SSE./kilo-proxypath, with a persistedsession-ingestfallback for session detail and transcript reads. Cold transcript pages use native cursors, bounded materialization, and omission metadata for oversized or forward-compatible unsupported items.prompt_asyncand abort throughCloudAgentSessioninstead of directly to the wrapper so admission and interruption remain durable across cold runtimes.501, typed outer routing fields are virtualized, and nested owner-visible payload content is preserved.High-level SDK facade architecture
sequenceDiagram autonumber participant SDK as @kilocode/sdk/v2 client participant Worker as cloud-agent-next Worker participant Facade as UserKiloFacade DO participant SessionDO as CloudAgentSession DO participant Wrapper as sandbox wrapper participant Runtime as in-process Kilo SDK server participant Ingest as session-ingest / SessionIngestDO SDK->>Worker: Authenticated /kilo/* request Worker->>Facade: Route with authenticated user context rect rgb(235, 245, 255) Note over SDK,Ingest: Live-first session detail and transcript reads Facade->>Wrapper: GET /kilo-proxy/session/:id[/message] Wrapper->>Runtime: Forward SDK read alt Runtime is available Runtime-->>Wrapper: SDK-compatible response Wrapper-->>Facade: Return live response else Runtime is unavailable Facade->>Ingest: Read persisted snapshot or transcript page Ingest-->>Facade: Bounded cold projection with native cursor metadata end Facade-->>SDK: Projected public response via Worker end rect rgb(245, 245, 235) Note over SDK,Wrapper: Durable mutations SDK->>Facade: POST prompt_async or abort via Worker Facade->>SessionDO: Admit prompt or interrupt execution opt Runtime is ready SessionDO->>Wrapper: Dispatch admitted work end Facade-->>SDK: Accepted or idempotent response via Worker end rect rgb(240, 250, 240) Note over SDK,Runtime: Fenced public event stream Runtime-->>Wrapper: /global/event SSE Wrapper->>Worker: Authenticated producer WebSocket upgrade Worker->>SessionDO: Validate producer identity SessionDO-->>Worker: Producer accepted Worker->>Facade: Forward producer socket Wrapper-->>Worker: Runtime event frames Worker-->>Facade: Forward event frames Facade-->>SDK: Filtered and virtualized public SSE endArchitecture decision
Decision: expose Cloud Agent sessions through a narrow authenticated
/kilofacade that implements the relevant@kilocode/sdk/v2contract, rather than introducing a Cloud Agent-specific client API.Context: existing UI clients already integrate with the Kilo SDK for session reads, conversation flows, async prompts, aborts, and event consumption. Cloud Agents provide similar capabilities, but their internal lifecycle differs: sessions can move between live sandbox runtimes and persisted history, mutations need durable admission, and wrapper topology must remain private.
Rationale: implementing the relevant Kilo SDK surface gives those clients a familiar integration path with minimal additional client-side work. The facade adapts the Cloud Agent lifecycle behind that contract: reads are live-first with bounded persisted fallback, prompts and aborts continue through
CloudAgentSession, and public events are filtered and virtualized before reaching SDK consumers. The initial facade intentionally supports only the subset needed by external clients; unsupported SDK routes return501rather than implying broader compatibility.Alternatives considered:
Consequences: Cloud Agents gain an adaptation layer that must preserve SDK-compatible behavior across live and cold sessions. In return, existing Kilo SDK clients can integrate with Cloud Agent-backed sessions without adopting a parallel protocol, while the internal runtime remains private and independently evolvable.
Verification
@kilocode/sdk/v2, including session attach and reads, async chat, abort, and event consumption.Visual Changes
N/A
Reviewer Notes
413rather than an ambiguous continuation.CloudAgentSessionafter socket acceptance.session-ingestbeforecloud-agent-next; the facade consumes additive session-ingest RPCs. Thecloud-agent-nextdeployment must include the Wranglerv5migration for the SQLite-backedUserKiloFacadeDurable Object./kilo-proxyor global-feed producer paths. Recycle or drain wrappers during rollout so live SDK reads and SSE delivery consistently use the new wrapper behavior.